Ranking Structured Documents: A Large Margin Based Approach for Patent Prior Art Search

نویسندگان

  • Yunsong Guo
  • Carla P. Gomes
چکیده

We propose an approach for automatically ranking structured documents applied to patent prior art search. Our model, SVM Patent Ranking (SVMPR) incorporates margin constraints that directly capture the specificities of patent citation ranking. Our approach combines patent domain knowledge features with meta-score features from several different general Information Retrieval methods. The training algorithm is an extension of the Pegasos algorithm with performance guarantees, effectively handling hundreds of thousands of patent-pair judgements in a high dimensional feature space. Experiments on a homogeneous essential wireless patent dataset show that SVMPR performs on average 30%-40% better than many other state-of-the-art general-purpose Information Retrieval methods in terms of the NDCG measure at different cut-off positions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CLEF-IP 2010: Prior Art Retrieval Using the Different Sections in Patent Documents

In this paper we describe our participation in the 2010 CLEF-IP Prior Art Retrieval task where we examined the impact of information in different sections of patent documents, namely the title, abstract, claims, description and IPC-R sections, on the retrieval and re-ranking of patent documents. Using a standard bag-of-words approach in Lemur we found that the IPC-R sections are the most inform...

متن کامل

Query Enhancement for Patent Prior-Art-Search Based on Keyterm Dependency Relations and Semantic Tags

Prior art search is one of the most common forms of patent search, whose goal is to find patent documents that constitute prior art for a given patent being examined. Current patent search systems are mostly keyword-based, and due to the unique characteristics of patents and their usage, such as embedded structure and the length of patent documents, there are rooms for further improvements. In ...

متن کامل

Automatic Prior Art Searching and Patent Encoding at CLEF-IP '10

In the intellectual property field two tasks are of high relevance: prior art searching and patent classification. Prior art search is fundamental for many strategic issues such as patent granting, freedom to operate and opposition. Accurate classification of patent documents according to the IPC code system is vital for the interoperability between different patent offices and for the prior ar...

متن کامل

Learning Translational and Knowledge-based Similarities from Relevance Rankings for Cross-Language Retrieval

We present an approach to cross-language retrieval that combines dense knowledgebased features and sparse word translations. Both feature types are learned directly from relevance rankings of bilingual documents in a pairwise ranking framework. In large-scale experiments for patent prior art search and cross-lingual retrieval in Wikipedia, our approach yields considerable improvements over lear...

متن کامل

TREC Chemical IR Track 2009: A Distributed Dimensional Indexing Model for Chemical Patent Search

For the TREC-2009 Chemical IR Track, we explore development of a distributed information retrieval system based on a dimensional data model. The indexing model supports named entity identification and aggregation of term statistics at multiple levels of patent structure including individual words, sentences, claims, descriptions, abstracts, and titles. The system was deployed across 15 Amazon W...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009